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Abstract. Generalized linear models (GLMs) have been used quite ef- 
fectively in the modeling of a mean response under nonstandard con- 
ditions, where discrete as well as continuous data distributions can be 
accommodated. The choice of design for a GLM is a very important 
task in the development and building of an adequate model. However, 
one major problem that handicaps the construction of a GLM design 
is its dependence on the unknown parameters of the fitted model. Sev- 
eral approaches have been proposed in the past 25 years to solve this 
problem. These approaches, however, have provided only partial solu- 
tions that apply in only some special cases, and the problem, in general, 
remains largely unresolved. The purpose of this article is to focus atten- 
tion on the aforementioned dependence problem. We provide a survey 
of various existing techniques dealing with the dependence problem. 
This survey includes discussions concerning locally optimal designs, se- 
quential designs, Bayesian designs and the quantile dispersion graph 
approach for comparing designs for GLMs. 

Key words and phrases: Bayesian design, dependence on unknown 
parameters, locally optimal design, logistic regression, response surface 
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1. INTRODUCTION 

In many experimental situations, the modeling of 
a response of interest is carried out using regression 
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techniques. The precision of estimating the unknown 
parameters of a given model depends to a large ex- 
tent on the design used in the experiment. By design 
is meant the specification of the levels of the factors 
(control variables) that influence the response. 

The tools needed for the adequate selection of a 
design and the subsequent fitting and evaluation of 
the model, using the data generated by the design, 
have been developed in an area of experimental de- 
sign known as response surface methodology (RSM) . 
This area was initially developed for the purpose 
of determining optimum operating conditions in the 
chemical industry. It is now used in a variety of fields 
and applications, not only in the physical and en- 
gineering sciences, but also in the biological, clini- 
cal, and social sciences. The article by Myers, Khuri 
and Carter (1989) provides a broad review of RSM 
(see also Myers, 1999). In addition, the three books 
by Box and Draper (1987), Myers and Montgomery 
(1995) and Khuri and Cornell (1996) give a compre- 
hensive coverage of the various techniques used in 
RSM. 
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Most design methods for RSM models in the 
present statistical literature were developed around 
agricultural, industrial and laboratory experiments. 
These designs are based on the standard general lin- 
ear model where the responses are assumed to be 
continuous (quite often, normally distributed) with 
uncorrelated errors and homogeneous variances. How- 
ever, clinical or epidemiological data, for example, 
quite frequently do not satisfy these assumptions. 
For example, data which consist primarily of hu- 
man responses tend to be more variable than is ex- 
pected under the assumption of homogeneous vari- 
ances. There is much less control over data collected 
in a clinical setting than over data observed in a lab- 
oratory or an industrial setting. Furthermore, most 
biological data are correlated due to particular ge- 
netic relationships. There are also many situations 
in which clinical experiments tend to yield discrete 
data. Dose-response experiments are one good ex- 
ample where the responses are binary in most cases. 
Furthermore, several responses may be observed for 
the same patient. For example, in addition to the 
standard binary response of success or failure, some 
measure of side effects of the treatment may be of 
importance. Since responses are measured from the 
same subject, they will be correlated, and hence con- 
sidering each response to be independent of others 
may lead to erroneous inferences. 

Due to the nature of the data as described above, 
doing statistical analysis of the data using standard 
linear models will be inadequate. For such data, 
generalized linear models (GLMs) would be more 
appropriate. The latter models have proved to be 
very effective in several areas of application. For ex- 
ample, in biological assays, reliability and survival 
analysis and a variety of applied biomedical fields, 
GLMs have been used for drawing statistical conclu- 
sions from acquired data sets. In multicenter clini- 
cal trials, estimates of individual hospital treatment 
effects are obtained by using GLMs (see Lee and 
Nelder, 2002). In entomology, GLMs are utilized to 
relate changes in insect behavior to changes in the 
chemical composition of a plant extract (Hern and 
Dorn, 2001). Diaz et al. (2002) adopted some GLMs 
in order to study the spatial pattern of an important 
tree species. In climatology, GLMs are used to study 
the basic climatological pattern and trends in daily 
maximum wind speed in certain regions (see Yan 
et al, 2002). Also, Jewell and Shiboski (1990) used 
GLMs to examine the relationship between the risk 



of HIV (human immunodeficiency virus) infection 
and the number of contacts with other partners. 

In all of the above examples and others, the cor- 
nerstone of modeling is the proper choice of designs 
needed to fit GLMs. The purpose of a design is the 
determination of the settings of the control variables 
that result in adequate predictions of the response of 
interest throughout the experimental region. Hence, 
GLMs cannot be used effectively unless they are 
based on efficient designs with desirable properties. 
Unfortunately, little work has been done in devel- 
oping such designs. This is mainly due to a serious 
problem caused by the dependence of a design on 
the unknown parameters of the fitted GLM. 

In this article, we address the aforementioned de- 
sign dependence problem by providing a survey of 
various approaches for tackling this problem. The 
article is organized as follows: Section 2 presents 
an introduction to GLMs. Section 3 describes cri- 
teria for the choice of a GLM design, and intro- 
duces the design dependence problem. The various 
approaches for solving this problem are discussed 
in Sections 4 (locally optimal designs), 5 (sequen- 
tial designs), 6 (Bayesian designs) and 7 (quantile 
dispersion graphs). The article ends with some con- 
cluding remarks in Section 8. 

2. GENERALIZED LINEAR MODELS 

As a paradigm for a large class of problems 
in applied statistics, generalized linear models have 
proved very effective since their introduction by 
Nelder and Wedderburn (1972). GLMs are a unified 
class of regression models for discrete and continu- 
ous response variables, and have been used routinely 
in dealing with observational studies. Many statisti- 
cal developments in terms of modeling and method- 
ology in the past twenty years may be viewed as spe- 
cial cases of GLMs. Examples include logistic regres- 
sion for binary responses, linear regression for con- 
tinuous responses and log-linear models for counts. 
A classic book on the topic is McCullagh and Nelder 
(1989). In addition, the more recent books by Lind- 
sey (1997), McCulloch and Searle (2001), Dobson 
(2002) and Myers, Montgomery and Vining (2002) 
provide added insights into the application and use- 
fulness of GLMs. 

There are three components that define GLMs. 
These components are: 
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(i) The elements of a response vector y are dis- 
tributed independently according to a certain prob- 
ability distribution considered to belong to the ex- 
ponential family, whose probability mass function 
(or probability density function) is given by 



(2.1) f(y,i 



exp 



Qy - KO) 

a(<j>) 



where a(-), b(-) and c(-) are known functions; 6 is a 
canonical parameter and (j> is a dispersion parameter 
(see McCullagh and Nelder, 1989, page 28). 

(ii) A linear regression function, or linear predic- 
tor, in k control variables x\, X2, ■ ■ ■ ,%k of the form 

r ? (x) = f T (x)/3, 



(2.2) 



where f(x) is a known p-component vector- valued 
function of x = (x±,X2, ■ ■ ■ ,Xk) T , (3 is an unknown 
parameter vector of order p x 1 and f r (x) is the 
transpose of f (x). 

(hi) A link function g(/x) which relates rj in (2.2) 
to the mean response /x(x) so that t?(x) = g\p(pc)), 
where <?(•) is a monotone differentiable function. 
When g is the identity function and the response 
has the normal distribution, we obtain the special 
class of linear models. 

GLMs have several areas of application ranging 
from medicine to economics, quality control and sam- 
ple surveys. Applications of the logistic regression 
model, expanded with the popularity of case-control 
designs in epidemiology, now provide a basic tool for 
epidemiologic investigation of chronic diseases. Sim- 
ilar methods have been extensively used in econo- 
metrics. Probit and logistic models play a key role in 
all forms of assay experiments. The log-linear model 
is the cornerstone of modern approaches to the anal- 
ysis of contingency table data, and has been found 
particularly useful for medical and social sciences. 
Poisson regression models are widely employed to 
study rates of events such as disease outcomes. The 
complementary log-log model arises in the study of 
infectious diseases (e.g., in HIV disease transmis- 
sion and AIDS as illustrated in Jewell and Shiboski, 
1990), and more generally, in the analysis of sur- 
vival data associated with clinical and longitudinal 
follow-up studies. 

Traditionally, the exponential family model 
adopted for the study of GLMs deals with a lin- 
ear function of the response variable involving the 
unknown parameters of interest. This covers most 
of the experimental situations arising in practice. 



However, some special members, such as the curved 
exponential family of distributions, are not covered. 
Thus, GLMs should be further generalized to in- 
clude such members. 

As was pointed out earlier, all known response 
surface techniques were developed within the frame- 
work of linear models under the strong assumptions 
of normality and equal variances concerning the er- 
ror distribution. One important area that needs fur- 
ther investigation under the less rigid structure of 
generalized linear models is the choice of design. 

3. CHOICE OF DESIGN 

By a choice of design we mean the determination 
of the settings of the control variables that yield an 
estimated (predicted) response with desirable prop- 
erties. The mean response, /i(x), at a point x in a 
region of interest, R, is given by 

/ u(x) = / l [f T (x)/3] 
= %(x)], 



(3.1) 



where r?(x) is the linear predictor in (2.2), and h 
is the inverse function of the link function g. An 
estimate of //(x) is obtained by replacing (3 in (3.1) 
with (3, the maximum likelihood estimate of (3, that 



is, 



(3.2) 



/i(x) = /i[f r (x)/3]. 



The variance of /2(x) is approximately given by (see 
Khuri, 1993, page 198) 

Var[/i(x)] 



(3.3) 



1 



(iu(x) 



dr](-x) 



f T (x)(X T WX)" 1 f(x) 



where (j) is the dispersion parameter (determined by 
the exponential family considered), X is a matrix 
whose rows consist of f r (x) at the various settings 
of x used in a particular design and W is a diagonal 
matrix of the form 

(3.4) W = diag(ioi,y;2, . . . ,w n ), 

where n is the number of experimental runs, and 
1 ( dfj, u ' 



(3.5) 



1",. 



tot, 



dg u 



u ■ 



1,2,. ..,71, 



where a\ is the variance of y u , the response value 
at the uth experimental run, and denotes the 
derivative of fi(x) with respect to r/(x) evaluated at 
the setting of x at the uth experimental run (u = 
1,2,. ..,n). 
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The estimation bias incurred in /i(x) is approxi- 
mately given by (see Robinson and Khuri, 2003) 

Bias[/i(x)] 

(3.6) -fWwxr^w^ 



+ i-f r (x)(X T WX)- 1 f(x)^^ 



)- A f(x) 



dr/ 2 (x) 



where 



and 



£= __W-%F1, 



Z d = diag(z n ,Z 2 2,---,Znn), 

where z uu is the uth diagonal element of Z = 
X(X T WX)" 1 X T , F = diag(/n, / 22 , • • • , fun) with 



fu 



1 



d\i u 



di] u 



u ■■ 



1,2, 



. n. 



and l n is an n x 1 vector of l's. Here 4ttt- denotes 

the second-order derivative of /x(x) with respect to 
77 (x) evaluated at the uth experimental run (u = 
1,2,. ..,n). 

A good design is one that minimizes the mean- 
squared error of /t(x), namely, 

MSE[/}(x)]=£[/i(x)-/i(x)] 2 
1 j =Var[£(x)] + {Bias[/2(x)]} 2 . 

This is known as the mean-squared error of predic- 
tion (MSEP). One major problem in doing this min- 
imization is that the MSEP depends on (3, the pa- 
rameter vector in the linear predictor in (2.2), which 
is unknown. This leads us to the so-called design 
dependence problem. Other design optimality cri- 
teria such as A-, D-, E- and G-optimality, which 
are variance-based criteria, suffer also from the same 
problem. 

3.1 The Design Dependence Problem 

In the foregoing section it has been emphasized 
that in the context of a GLM, the minimization of 
the mean-squared error of prediction, or of the vari- 
ances of the parameter estimators, leading to the 
so-called optimal designs, depends on the values of 
unknown parameters. Common approaches to solv- 
ing this problem include: 

(a) The specification of initial values, or best 
"guesses," of the parameters involved, and the sub- 
sequent determination of the so-called locally opti- 
mal designs. 



(b) The sequential approach which allows the user 
to obtain updated estimates of the parameters in 
successive stages, starting with the initial values used 
in the first stage. 

(c) The Bayesian approach, where a prior distri- 
bution is assumed on the parameters, which is then 
incorporated into an appropriate design criterion by 
integrating it over the prior distribution. 

(d) The use of the so-called quantile dispersion 
graphs approach, which allows the user to compare 
different designs based on quantile dispersion pro- 
files. 

We now provide a review of the basic results that 
have been developed under the aforementioned ap- 
proaches. 

4. LOCALLY OPTIMAL DESIGNS 

Binary data under a logistic regression model and 
Poisson count data are the best known examples to 
illustrate the implementation of the first approach 
leading up to a locally optimal design. 

4.1 Logistic Regression Model 

Let us first discuss the study of optimal designs for 
binary data under a logistic regression model. The 
key reference is Mathew and Sinha (2001). Other 
related references include Abdelbasit and Plackett 
(1983), Minkin (1987), Khan and Yazdi (1988), Wu 
(1988) and Sitter and Wu (1993). It is postulated 
that a binary response variable y assumes the val- 
ues and 1 and the chance mechanism depends on 
a nonstochastic quantitative covariate X taking val- 
ues in a specified domain. Specifically, for X = x,y 
takes the value 1 with probability given by 



(4.1) 



7r(x) 



1 



1 + exp(— a — xf3) ' 



where a and /3 are unknown parameters with (3 > 0. 
It may be noted that there are other versions of this 
model studied in the literature. One version refers 
to the dose-response model which will be discussed 
in the next sections. The parameters themselves and 
also some parametric functions such as ^ and per- 
centiles of 7r(x) are of interest to the experimenter. 
The purpose is to suggest continuous (or approxi- 
mate) optimal designs, that is, optimum dose lev- 
els and their relative weights, following the termi- 
nology of continuous design theory. See Pukelsheim 
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(1993). It turns out that the solutions to the opti- 
mal design problems mentioned above provide op- 
timum values of a + x(3. Hence, in order to imple- 
ment such designs in practice, good initial estimates, 
or guess values, of a and are needed. Whereas 
the earlier researchers established optimality results 
case by case, Mathew and Sinha (2001) developed a 
unified approach to tackle the optimality problems 
by exploiting the property of Loewner-order dom- 
ination in the comparison of information matrices. 
However, in some of these studies the technical de- 
tails depend crucially on the symmetry of the trans- 
formed factor space. For asymmetric domains, Liski 
et al. (2002) have initiated some studies in the con- 
ventional regression setup. Much yet remains to be 
done there and also in the context of logistic regres- 
sion. 

With reference to the model (4.1), consider now 
s distinct dose levels x±,X2, ■ ■ ■ ,x s , and suppose we 
wish to obtain /j observations on y at dose level Xi 
(i = 1, 2, . . . , s). Let J2t=i fi = n - For most efficient 
estimation of a and (3, or some functions thereof, 
the exact optimal design problem in this context 
consists of optimally selecting the number s of dis- 
tinct dose levels, the x^s (in a given experimental 
region) and the /j's, with respect to a given optimal- 
ity criterion, for a fixed n. This is equivalent to the 
approximate determination of optimum dose levels 
with respective optimum (relative) weights, denoted 
by Pi (2 = 1,2,..., s), which sum to 1. 

For various application areas, it turns out that 
the estimation problems that are usually of interest 
refer to (a) the estimation of (3, or a/ (3, or some 
percentiles of tt(x) given in (4.1), or (b) the joint es- 
timation of a pair of parameters such as (i) a and /3, 
(ii) (3 and a/ {3, (iii) (3 and a percentile of ir(x) and 
(iv) two percentiles of ir(x). The approach is to start 
with the asymptotic variance-covariance matrix of 
the maximum likelihood estimators of a and (3, and 
then choose the Xj's and the pj's optimally by mini- 
mizing a suitable function, depending on the nature 
of the problem at hand and the specific optimal- 
ity criterion applied. For this reason, we consider 
the information matrix of the two parameters as 
a weighted combination of component information 
matrices based on the x,'s, using the pj's as weights. 
Then we argue that the asymptotic variance-covari- 
ance matrix of the maximum likelihood estimators 
of the parameters is just the inverse of the weighted 
information matrix computed above. The optimality 
functions to be minimized are different scalar- valued 



functions of the information matrix. The D- and 
A-optimality criteria are well-known examples. His- 
torically, the D-optimality criterion has received con- 
siderable attention in this context and A-optimality 
has also been considered by some authors; see Sitter 
and Wu (1993). As was mentioned earlier, the op- 
timum dose levels depend on the unknown parame- 
ters a, and (3, as is typical in nonlinear settings. In 
fact, solutions to the optimal design problems men- 
tioned above provide optimum values of a + f3xi, 
i = 1, . . . , s. Therefore, while implementing the op- 
timal design in practice, good initial estimates of a 
and (3 are called for. In spite of this unpleasant fea- 
ture, it is important to construct the optimal designs 
in this context; see the arguments in Ford, Torsney 
and Wu (1992, page 569). 

Following the approximate design theory, a design 
is denoted by T> = {(xi,pi),i = 1, 2, . . . , s}. There- 
fore, the information matrix for the joint estimation 
of a and [3 underlying the design T> is given by 

/ -A exp(-Oi) 



I(a,(3) 



(4.2) 



i=l 
s 

E Pi' 

i=l 
s 

E 

1=1 

s 

E 



(1 +exp(- 
exp(- 



PiX 



(l +exp (_ a .))2 

exp(— a 



i (l + exp( 
exp(- 



) \ 



an 



where a j = a + (3x{ , i 



(1 + exp(- 
l,...,s. 



-a,)) 2 / 



Note that except for the factors (i+exp(-aO)^ ' 



the 

information matrix is identical to the one under the 
usual linear regression of y on a nonstochastic re- 
gressor x under the assumption of homoscedastic 
error structure. This reminds one of the celebrated 
de la Garza Phenomenon (de la Garza, 1954) which 
can be explained as follows. Suppose we consider a 
pth-degree polynomial regression of y on x under 
homogeneous error structure and we start with an 
n-point design where n>p + 1. Then, according to 
this phenomenon, it is possible to come up with an 
alternative design with exactly p + 1 support points 
such that the two designs have identical information 
matrices for the entire set of p+ 1 parameters. This 
shows that in the homoscedastic scenario, essentially 
one can confine attention to the collection of designs 
supported on exactly p + 1 points, if the underlying 
polynomial regression is of degree p. On top of this, 
it is also possible that a particular (p+ l)-point de- 
sign dominates another (p + 1) -point design in the 
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sense that the information matrix (for the entire set 
of parameters) based on the former design Loewner- 
dominates that based on the latter. Here, Loewner 
domination means that the difference of the two in- 
formation matrices is nonnegative definite. It must 
be noted that Loewner domination is the best one 
can hope for, but it is rarely achieved. Pukelsheim 
(1993) has made some systematic studies of Loewner 
domination of information matrices. See Liski et al. 
(2002) for some applications. 

In the present setup, however, the form of the in- 
formation matrix indicates that we are in a linear 
regression setup, but with a heteroscedastic error 
structure. Does the de la Garza Phenomenon still 
hold in this case? More importantly, do we have 
Loewner domination here? The Mathew and Sinha 
(2001) article may be regarded as one seeking an- 
swers to the above questions. From their study, it 
turns out that though Loewner domination is not 
possible, for D-optimality the class of two-point de- 
signs is complete in the entire class of competing 
designs while for A-optimality, it is so in a sub- 
class of competing designs which are symmetric in 
some sense. Here, completeness is in the sense that 
any competing design outside the class is dominated 
(with respect to the specific optimality-criterion) by 
another design within the class. 

We shall briefly explain below the salient features 
of the arguments in Mathew and Sinha (2001). Note 
first that for the joint estimation of a and (3, or for 
that matter, for any two nonsingular transforms of 
them, the D-optimality criterion seeks to maximize 
the determinant of the joint information matrix of 
a and 0. Routine computations yield an interesting 
representation for this determinant, 



(3 2 \I(a,(3)\ 



(4.3) 



^ exp(-aj) 

2_^(l + eX p ( _ a .))2_ 

^ 2 expj-Oj) 
^ Piai (l + exp(-a,)) 2 



exp(-aj) 



li=l 



i=l 



(l + exp(-ai)) 2 



Note that for any real number a, cxp ^, a \ vl = 

J ' (l+exp(— a)Y 

cxp(a) Moreover, it also turns out that for a 

(l+exp^a)) 2 ' 

given T> = {(xi,pi),i = 1,2, ... ,s}, there exists a real 
number c> such that 

(A A) V n eXp(Qt) = eXp ( C ) 
{ - ) ^(l + exp^)) 2 (l + exp(c)) 2 ' 



(4.5) Y.P 



,01 



exp(flj) 



<c 



exp(c) 



i=l 



(l+exp(a;)) 2 " (l + exp(c)) 2 ' 



Therefore, for this choice of c, the design T>(c) = 
[(c, 0.5); (— c, 0.5)] provides a larger value of the de- 
terminant of the underlying information matrix than 
that based on T>. Hence, the class of two-point sym- 
metric designs provides a complete class of Z?-optimal 
designs. It is now a routine task to determine spe- 
cific D-optimal designs for various parametric func- 
tions. Of course, the initial solution is in terms of 
c-optimum (which is independent of a and (5) and 
then we have to transfer it to optimum dose levels, 
say xq and xqo, by using the relations c = a + (3xq 
and — c = a + /3xoo- Thus an initial good guess of a 
and P is called for to evaluate xq and xoo- 

Again, for ^4-optimality with respect to the pa- 
rameters a and j3, some algebraic simplification yields 
the following function (to be minimized) when we 
restrict to the subclass of symmetric designs (i.e., 
designs involving and — a, with equal weights for 
every i): 



Var(a)+Var(/3) 



exp(- 



(4.6) 



-y m 

^^U + expt 



1 

V i=l 



O; 



=1 

m 



[a 2 + /3 



2 + a 2 } 



exp(- 



E 



PiO-i 



(l + exp(-a i )) s 



cxp( 



(l + exp(-ai)) 2 
a 2 + /3 2 



+ 



m 1 p i a 2 exp(- 



■Oi)/(l + exp(- 
1 



YT=iPi exp(-ai)/(l +exp(-ai)) 2 ' 

In view of the existence of the real number c with 
the properties laid down above, it turns out that 
the design T>(c) = [(c, 0.5); (— c, 0.5)] once again does 
better than T> with respect to A-optimality, at least 
in the subclass of symmetric designs so defined. It 
is a routine task to spell out the nature of a specific 
^4-optimal design, which in this case depends on the 
unknown parameters a and (5 in a twofold manner. 
First, we have to determine the optimum value for c 
from given values of the parameters by minimizing 
the lower bound to (4.6) as a function of a,/3 and 
c. Then we have to evaluate the values of the two 
recommended dose levels using the relations involv- 
ing c and — c, as above. Note that in the context 
of D-optimal designs, this twofold phenomenon did 
not arise. 
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Remark 4.1.1. It so happens that without the 
symmetry restriction on the class of competing de- 
signs, construction of ^-optimal designs, in general, 
is difficult. Numerical computations have revealed 
that ^4-optimal designs are still point symmetric but 
not weight symmetric. An analytical proof of this 
observation is still lacking. In Mathew and Sinha 
(2001), .E-optimal designs have also been studied. 

4.2 Poisson Count Model 

Let us now consider Poisson count data and ex- 
plain the optimality results following Minkin (1993) 
and Liski et al. (2002). Here we assume that y fol- 
lows a Poisson distribution with mean /i(x), n(x) = 
c(x) exp [9(x)] , 6{x) = a + (3x; (3 < 0. The emphasis 
in Minkin (1993) was on most efficient estimation 
of 1/(3 by choosing a design in the nonstochastic 
factor space of x over [0, oo). Naturally, only a lo- 
cally optimal design could be characterized, which 
turned out to be a two-point design with 21.8% mass 
at and the rest at 2.557/3. Thus a good guess for 
(3 is called for. Liski et al. (2002) developed a uni- 
fied theory in Minkin's setup for the derivation of 
a stronger result on completeness of two-point de- 
signs, including the point 0. Consequently, it was 
much easier to re-establish Minkin's result as well 
as to spell out explicitly the nature of A-, D-, E- 
and M ^-optimal designs for simultaneous estima- 
tion of the two parameters in the model. It may be 
noted that Fedorov (1972) gave a complete char- 
acterization of D-optimal designs in a polynomial 
regression setup involving several families of het- 
erogeneous variance functions. In an unpublished 
technical report, Das, Mandal and Sinha (2003) ex- 
tended the Loewner domination results of Liski et 
al. (2002) in a linear regression setup involving two 
specific variance functions. We shall briefly present 
these results below. 

We recall that the setup of Minkin (1993), in a 
form suitable for our discussion, and as suggested in 
Liski et al. (2002), is of the form 

E(y) =a + (3x; 
(4.7) V(y)=v(x)a 2 ; 

v(x) = exp(x); < x < oo. 

As usual, we assume a 2 = 1 and confine attention to 
approximate design theory. Thus, as before, to start 
with we have an s-point design T> s = [(xi,pi);0 < 
x\ < X2 < • • • < x s ;J2iPi = 1] f° r s > 2. It is a rou- 
tine task to write down the form of the information 



matrix for the parameters a,f3 underlying the de- 
sign V s . We call it Lp s . Liski et al. (2002) established 
that given V s , one can construct a two-point design 
T>2 whose information matrix, say IT>2, dominates 
Iv s - Das, Mandal and Sinha (2003) generalized this 
result when the above form of v(x) is changed to 

(4 ' 8j (ii) v(x) = (l + x)^y 2 , 7 >-l. 

Following Das, Mandal and Sinha (2003), we start 
with a general variance function v(x) subject to 
v(0) = 1 and v{x) increasing in x over to oo. Next, 
we start with a two-point design of the form [(a,p); 
(b, q)] where < a < b < oo and < p,q = 1 — p < 
1. Then, we ask for the sort of variance functions 
for which this two-point design can be Loewner- 
dominated by another two-point design, including 
the point 0. Define in this context two other related 
functions, ip(x) = l/v(x) and <j)(x) = (v(x) — l)/x 
for every x > 0, 4>(0) being the limit of (j)(x) as x 
tends to 0. It follows that whenever these two func- 
tions satisfy the following conditions, it is possible 
to achieve this fit: 

(i) 0(0) = 1; 

(ii) (j>(x) is increasing in x; 

(iii) for some s and c, < s < 1 and c > 0, for 
which 

1 - s = - ^( a )) + q (l - V>(6))]/[1 - V(c)], 

V>(c) satisfies the inequality ip(c) <p^(a) +qip(b). 

Das, Mandal and Sinha (2003) demonstrate that 
for both forms of v(x) as in (4.8), the above con- 
ditions are satisfied. They then continue to argue 
that this result on Loewner domination (by a two- 
point design, including the point 0) holds even when 
one starts with an s-point design for s > 2. This 
greatly simplifies the search for specific optimal de- 
signs under different variance structures covered by 
the above two forms. Minkin's result follows as a 
special case of (i) in the above model. The details 
are reported in Das, Mandal and Sinha (2003). 

5. SEQUENTIAL DESIGN 

In the previous section, initial values of the pa- 
rameters are used as best "guesses" to determine a 
locally optimal design. Response values can then be 
obtained on the basis of the generated design. In the 
sequential approach, experimentation does not stop 
at this initial stage. Instead, using the information 
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thus obtained, updated estimates of the parameters 
are developed and then used to determine additional 
design points in subsequent stages. This process con- 
tinues until convergence is achieved with respect 
to a certain optimality criterion, for example, D- 
optimality. The implementation of such a strategy is 
feasible provided that the response values in a given 
stage can be obtained in a short time, as in sen- 
sitivity testing. Sequential designs for GLMs were 
proposed by Wu (1985), Sitter and Forbes (1997) 
and Sitter and Wu (1999), among others. 

The theory of optimal designs, as noted earlier, 
involves selection of design points with the goal of 
minimizing some objective function which can often 
be interpreted as an expected loss. In a sequential 
framework, with the arrival of each data point, one 
needs to make a decision of whether or not to pay 
for the cost of additional data or else to stop sam- 
pling and make a decision. Also, if additional obser- 
vations become necessary, when there is an option 
of selecting samples from more than one population, 
the choice of a suitable sampling rule is equally im- 
portant. The latter, on many occasions, amounts to 
the selection of suitable design points. Thus the issue 
of optimal designs goes hand in hand with sequen- 
tial analysis, and together they constitute an area 
of what has become known as sequential design of 
experiments. 

Research on sequential designs can be classified 
into multiple categories depending on the objective 
of the researcher. We review here primarily a broad 
area commonly referred to as "Stochastic Approxi- 
mation," which was initiated by Robbins and Monro 
(1951), and was subsequently extended by numerous 
authors. Here, the problem is one of sequential selec- 
tion of design points according to some optimality 
criterion. We also discuss briefly some work on mul- 
tistage designs. 

In Section 5.1, we begin with the stochastic ap- 
proximation procedure as described in Robbins and 
Monro (1951). We then consider several extensions 
and modifications of this pioneering work, and dis- 
cuss asymptotic properties of the proposed meth- 
ods. The Robbins-Monro article and much of the 
subsequent work do not prescribe any stopping rule 
in this sequential experimentation. There are a few 
exceptions, and we point out one such result. 

Section 5.2 discusses application of the Robbins- 
Monro method and its extensions for estimating the 
percentiles of the quantal response curve, in partic- 
ular, estimation of the median effective dose, pop- 



ularly known as ED50. We highlight in this con- 
text the work of Wu (1985). We also discuss briefly 
the Bayesian stopping rule as proposed by Freeman 
(1970). Finally, in Section 5.3, we provide a very 
short account of multistage designs. 

In general, the core of sequential analysis involves 
sample size determination. This itself is an "opti- 
mal" design problem as stopping rules, in general, 
are motivated by some optimality criteria. A very 
succinct account of sequential design of experiments, 
motivated by several important statistical criteria, 
appeared in Chernoff's (1972) classic monograph. 

5.1 Stochastic Approximation 

We begin with a description of the stochastic ap- 
proximation procedure of Robbins and Monro (RM) 
(1951). Let y be a random variable such that con- 
ditional on x, y has a distribution function (df) 
H(y\x) with mean E(y\x) = M(x) and variance 
V(y\x) = (T 2 (x). The function M(x) is unknown to 
the experimenter, but is assumed to be strictly in- 
creasing so that the equation M(x) = a has a unique 
root, say C- R is desired to estimate ( by making 
successive observations on y, say, 2/1,2/2, •■ • at levels 

xi,X2, The basic problem is the selection of the 

design points xi,X2, 

RM address this problem as follows: consider a 
nonstationary Markov chain with an arbitrary initial 
value x±, and then define recursively 

(5.1) x n+ i = x n - a n (y n - a), n = l,2,..., 

where y n has the conditional distribution function 
H(y\x n ), and {a n } is a sequence of positive con- 
stants satisfying 

00 

< < 00. 

n=l 

A finite sample justification of the RM procedure 
is given by Wu (1985) when y and x are related by 
a simple linear regression model. In particular, let 
Hi = Po + 0iXi + ei, where ei are independent and 
identically distributed with mean 0. Then the pa- 
rameter of interest is £ = — fto/fti, the solution of 
flo + j3\x = 0. If 0i is known, 0Q n = y n — 0\x n is the 
least-squares estimator of 0o based on {(xi,yi);i = 
1, . . . , n}. Thus a natural choice of x n+ \ is given by 
x n +i = -Pon/Pi =x n - (ii l y n - R is shown in Lai 
and Robbins (1979) that x n+ \ = x n — @i y n for all 
n is equivalent to x n+ i = x n — (/3f 1 /n)y n for all 
n. This is an RM procedure with a n = f3^ 1 jn and 

Q = 0. 
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RM investigated asymptotic properties of their 
procedure. Let b n = E(x n — Q 2 . RM showed that 
for bounded y and strictly increasing M(x), if a n = 
0(n _1 ), then b n — > as re — > oo. 

A more general result is proved later in Robbins 
and Siegmund (1971). Rather than the boundedness 
of y and the monotonicity of M(x), one assumes 
boundedness of a(x) + |M(x)| by a linear function of 
\x\. These conditions neither imply nor are implied 
by the original RM conditions. In addition, Rob- 
bins and Siegmund (1971) have another assumption 
which essentially implies that M{x) and x — C & re 
of the same sign. Finally, they assume that 

oo oo 

(5.2) 0<^]a n = oo, 0<^a 2 <oo, 

n=l n=l 

which is strictly weaker than the condition a n = 
O^" 1 ) as required in RM. 

Fabian (1968) proved the asymptotic normality of 
x n under additional assumptions. His conditions are 
similar to those of Robbins and Siegmund, but (5.2) 
is replaced by the stronger condition 

(5.3) nia n -> a(> 0) 

as re — > oo for some 7 S (1/2, 1]. Also, he needed a 
Lindeberg-type condition for the second moment of 
the conditional distribution of y given x and also 
required M to have a positive derivative M'(Q at 
£ and M'(Q > (2a) -1 when 7 = 1. The variance 
of the asymptotic distribution depended on a(Q, 
a and M'(£). The asymptotic variance was given by 

aa 2 {Q/{2M\Q) if 7 € (1/2, 1) and aV(C)/ 
[2aM'(C) - 1] if 7 = 1. 

Remark 5.1.1. The best rate of convergence is 
achieved when 7 = 1, and then the minimum asymp- 
totic variance is given by cr 2 (£)/[M'(£)] 2 . This is at- 
tained when a = [M'(C)] -1 . 

In view of Remark 5.1.1, an optimal asymptotic 
choice of {a n } is given by a n = (n + no)~ 1 d n , where 
d n is any consistent estimator of M'(£) and uq is 
positive. One can interpret no as a "tuning param- 
eter" which does not affect the asymptotics at all, 
but can play a major role when the sample size is 
small. Also, Fabian (1983) has shown that when the 
conditional probability density function of y given x 
is N(M(x), a 2 (x)), then the RM procedure is locally 
asymptotically minimax. Moreover, Abdelhamid 
(1973) and Anbar (1973) pointed out independently 



that even for nonnormal conditional probability den- 
sity functions, the RM process can be made asymp- 
totically optimal by a suitable transformation of ob- 
servations. 

The next thing is to outline procedures which guar- 
antee consistent estimation of M'(Q. Venter (1967) 
addressed this by taking observations in pairs at 
Xi ± Cj for a suitable positive sequence of constants 
{ch}, i > 1. We denote by ya and y{ 2 the correspond- 
ing responses so that 

(ka\ E(y il \x i ) = M(x i -c i ); 

{0 ' V E{y i2 \x i ) = M{x i + c i ). 

Let Zi = (y i2 - yn)/(2a) and y-i = (y i2 +yn)/2. If 

Xi C, then estimate M'(() by d n = z n = \ Y2=i z i- 
Define recursively the design points 

(5.5) x n+ i =x n - {nd n )~ l {y n -a), re =1,2,..., 

where d n is a truncated version of d n ensuring that 

x n —* C- Fabian (1968) and later Lai and Robbins 
(1979) contain relevant asymptotic results. 

In spite of all the asymptotic niceties, the RM pro- 
cedure can be seriously deficient for finite samples. 
This was demonstrated in the simulation studies of 
Wu (1985) and Frees and Ruppert (1990). First, the 
initial choice of x\ heavily influences the recursive 
algorithm. If x\ is far away from £, then the conver- 
gence of x n to Q may demand prohibitively large n. 
Again, if all the closely clustered around £, 

no estimator of M'{Q can be very accurate, and im- 
precise estimation of M'(£) leads to imprecise esti- 
mation of C due to the proposed recursive algorithm, 
at least for small samples. 

The original RM paper, and much of the sub- 
sequent literature, address sequential design prob- 
lems without specifying any stopping rule. However, 
a simple stopping rule can be proposed based on 
asymptotic considerations. In particular, in the setup 
of Venter (1967), d n consistently estimates M'{Q. 
Defining a 2 = (2n) _1 S?=i(j/a+2/f2) ' ^ can shown 
that <t 2 is a consistent estimator of c 2 (C)- Now in- 
voking the asymptotic normality of x n , a large sam- 
ple 100(1 — 7)% confidence interval for £ is given by 
x n ± ^"H 1 - 7/2)<7„/{(i n (2re) 1 / 2 }, where $ is the 
cumulative distribution function of the N(0, 1) vari- 
able. If one decides to construct a confidence inter- 
val of length not exceeding I, then one can stop at 
the first n for which the width of the above inter- 
val is less than or equal to I. Sielken (1973) pro- 
posed this stopping rule and proved that as I — > (so 
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that n — > oo), the coverage probability converges to 
1 — 7. Alternative stopping rules have been proposed 
by Stroup and Braun (1982) and Wei (1985) when 
the responses are normal. Ruppert et al. (1984) pro- 
posed a stopping rule for Monte Carlo optimization 
which is so designed that M{x n ) is within a specified 
percentage of M(Q. 

Kiefer and Wolfowitz (1952) considered a related 
problem of locating the point where the regression 
function is maximized or minimized instead of find- 
ing the root of a regression equation as in RM. This 
amounts to solving M'(£) = 0. The resulting proce- 
dure is similar to that of RM, but y n — a in (5.1) 
is replaced by an estimate of M'(£). We omit the 
details. 

One of the major applications of the RM stochas- 
tic approximation procedure is the estimation of the 
percentiles of the quantal response curve. We shall 
specifically discuss this problem in the next section. 

5.2 Quantal Response Curves 

Consider now the special case when y is a binary 
variable with 

(5.6) E{y\x) = tt(x) = [1 + exp{-9(x - n)}]" 1 . 

The independent variable x is the dose level, while 
the binary response y is a quantal variable. Such a 
model is called a "dose-response" model. The pa- 
rameter fi is the 50% response dose, or ED50. It is 
easy to recognize this as a logistic model with lo- 
cation parameter [i and scale parameter 9 and as 
a simple reparameterization of the model given in 
(4.1). Also, here tt(x) is the same as M(x). Equating 
tt(x) to a leads to the solution £ = /i + # _1 log(y^). 
Hence, efficient estimation of £ depends on efficient 
estimation of /i and 9. 

The direct RM procedure will continue to gener- 
ate the x-sequence with an arbitrary initial value x\, 
and then define x n+ \ = x n — a n (y n — a), where one 
may take proportional ton . However, as rec- 
ognized by many, including RM, for specific distri- 
butions there may be a more efficient way of generat- 
ing the x values. Indeed, simulations by Wu (1985), 
and subsequently by Frees and Ruppert (1990), 
demonstrate the poor performance of the RM pro- 
cedure for small n in this example. 

Wu's procedure begins with generating some ini- 
tial values X\, . . . , x m . Then one obtains the MLE's 
fi m and 9 m of /i and 6, respectively, based on {(xi, yi); 
i = l,...,m} by solving the likelihood equations 
W ET=iVi = Y?=i<Xi) and (ii) T£i*iVi = 



X^i£i7r(xi). Let x m+ i = fi m + 9^ log( T ^). Up- 
date the MLE's of n and 9 by fi m +\ and 9 m+ i, re- 
spectively, based on {(xi,yi);i = 1, . . . , m + 1}. Now 

let x m+2 = fim+i + 9^ l 1 +1 \og( J ^) , and contmue in 
this manner. The {x n } generated in this way meet 
the consistency and asymptotic optimality proper- 
ties mentioned in the previous section. Wu (1985) 
also suggested modification of the likelihood equa- 
tions (i) and (ii) by multiplying both sides of the ith 
component by a factor w(\xi — x m \), where w is a 
certain weight function. This can partially overcome 
vulnerability of the logits at extreme tails. 

As mentioned earlier, the case a = 1/2 is of special 
interest since then £ = /i = ED50. In this case, an 
alternative procedure known as the "up-and-down" 
procedure for generating the design points was pro- 
posed by Dixon and Mood (1948). Specifically, let 



X n +1 



+ A, 
- A, 



h Vn 
if Vn 



0, 
1, 



where A is somewhat arbitrary. Once again, perfor- 
mance of this procedure depends very much on the 
choice of a good guess for x\ and A. Unless A is 
made adaptive, the large sample performance of x n 
cannot be studied. Wetherill (1963) discusses some 
modifications of this method. 

One important issue that has not been discussed 
so far is the choice of the stopping rule. This requires 
explicit consideration of payoff between the cost of 
further observation and that of less accurate esti- 
mation. The study was initiated by Marks (1962) 
who considered the problem as one of Bayesian se- 
quential design with known 9 and a two-point prior 
for fi. Later, Freeman (1970) considered the prob- 
lem, once again for known 9, but with a conjugate 
prior for //. In addition, he considered special cases 
of one, two or three dose levels. For one dose level, 
he introduced the prior 

exp(r o 0(C - M)) 
[l + exp(<9(C-/i))f° 

This amounts to a Beta(ro,no — vq) prior for the 
response probability a at dose level with prior 
parameters < ro < uq . 

With squared error loss plus cost and a uniform 
prior, that is, tq = I and uq = 2, Freeman (1970) 
considered the usual backward induction argument 
to set up the necessary equations for finding the 
Bayes stopping rule. The stopping rule cannot be 
found analytically, but Freeman provided the nu- 
merical algorithm for finding the solution. He also 
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considered situations with two or three possible doses, 
but did not provide a general algorithm. Thus, what 
is needed here is an approximation of the "optimal" 
stopping rule. A Bayesian approach using dynamic 
programming is hard to implement in its full gen- 
erality. It appears that a suitable approximation of 
the Bayes stopping rule which retains at least the 
asymptotic optimality of the regular Bayes stopping 
rule is the right approach toward solving this prob- 
lem. 

5.3 Multistage Designs 

In most practical situations, it is more realistic 
to adopt a multistage design rather than a sequen- 
tial design, since continuous updating of the infer- 
ential procedure with each new observation may not 
be very feasible. We discuss one such application as 
considered in Storer (1989) in the context of Phase I 
clinical trials. These trials are intended to estimate 
the maximum tolerable dose (MTD) of a new drug. 
Although a strict quantitative definition of MTD 
does not exist in clinical trials, very often the 33rd 
percentile of the tolerance distribution is taken to 
define MTD. 

From a clinician's perspective, an optimal design 
is one in which the MTD is defined by the dose at 
which the trial stops. Storer (1989, 1990) argues that 
the design problem should be viewed instead as an 
efficient way of generating samples, wherein the de- 
sign and analysis are robust to the vagaries of pa- 
tient treatment in a clinical setting. He first intro- 
duces four single-stage designs, and then proposes 
two two-stage designs by combining some of these 
single-stage designs. The details are available in the 
two cited papers. Storer implemented his proposal 
for MTD estimation in a dose-response setting with 
three logistic curves. 

6. BAYESIAN DESIGNS 

Application of Bayesian design theory to general- 
ized linear models is a promising route to avoid the 
design dependence problem. One approach, as dis- 
cussed in Section 3, is to design an experiment for a 
fixed best guess of the parameters leading to a "lo- 
cally optimal" (Chernoff, 1953) design. Locally opti- 
mal designs for nonlinear models were first suggested 
in the seminal paper by Box and Lucas (1959). As 
suggested in Box and Lucas (1959), another natu- 
ral approach to solve this problem is to express the 
uncertainty in the parameters through a prior dis- 
tribution on the parameters. The Bayesian design 



problem for normal linear models has been discussed 
in Owen (1970), Brooks (1972, 1974, 1976, 1977), 
Chaloner (1984), Pilz (1991) and DasGupta (1996). 
Chaloner and Verdinelli (1995) present an excellent 
overview of Bayesian design ideas and their appli- 
cations. Atkinson and Haines (1996) discuss local 
and Bayesian designs specifically for nonlinear and 
generalized linear models. 

6.1 Bayesian Design Criteria 

The Bayesian design criteria are often integrated 
versions of classical design optimality criteria where 
the integration is carried out with respect to the 
prior distribution on (3 [j3 is as in (2.2)]. Most of 
the Bayesian criterion functions are based on nor- 
mal approximations to the posterior distribution of 
the vector of parameters (3, as computations involv- 
ing the exact posterior distribution are often in- 
tractable. Several such approximations to the pos- 
terior are available (Berger, 1985) and involve ei- 
ther the observed or the expected Fisher informa- 
tion matrix. The most common form of such nor- 
mal approximation states that under standard reg- 
ularity conditions, the posterior distribution of j3 is 
N(/3, [nl(3,£)] _1 ), where f3 is the maximum likeli- 
hood estimate (MLE) of (3, and for any given design 
measure £, the expected Fisher information matrix 
is denoted by I(/3, £). We shall assume, as usual, that 
the design measure £ puts relative weights (pi,P2, ■ ■ ■ , 
Pk) at k distinct points (x\, . . . ,Xf~), respectively, 
with £?=iPi = 1. The prior distribution for (3 is 
used as the predictive distribution of (3 and con- 
sequently the Bayesian optimality criteria can be 
viewed as approximations to the expected posterior 
utility functions under the prior distribution p{f3). 
The first criterion which is an analogue to the D- 
optimality criterion in classical optimality theory is 
given by 

(6.1) <MO = £/3[logdetI(/3,£)]. 

Maximizing this function is equivalent to approxi- 
mately maximizing the expected increase in the 
Shannon information or maximizing the expected 
Kullback-Leibler distance between the posterior and 
prior distributions (Lindley, 1956; DeGroot, 1986; 
Bernardo, 1979). 

The next criterion is of interest when the only 
quantity to be estimated is a function of /3, say h((3). 
In such situations, the approximate asymptotic vari- 
ance of the MLE of h(f3) is 

cG3) T p[()3 ) e)]- 1 c09), 
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where the ith element of c(/3) is q(/3) = dh((3)/d(3i. 
The Bayesian c-optimality criterion approximates 
the posterior expected utility (assuming a squared 
error loss) under the prior p((3) as 

(6.2) MO = -EpicipfllifrO}- 1 ^)}. 

As in the generalization from c- to A-optimality 
in classical optimality theory, if one is interested in 
estimating several functions of (3 with possibly dif- 
ferent weights attached to them and if B(/3) is the 
weighted average of the individual matrices of the 
form c(/3)c(/3) T , then the criterion to be maximized 
is 

(6.3) U0 = -E f 3{trB((3)[I(P,0r 1 }- 

The functions c(/3), B(/3) may not depend on (3 if 
one is considering only linear functions of (3. 

Since the interpretation of the Bayesian alpha- 
betic optimality criteria, as approximations to ex- 
pected utility, is based on normal approximations to 
the posterior, Clyde and Chaloner (1996, 2002) sug- 
gest several approaches to verify normality through 
imposing constraints and discuss how to attain such 
multiple design objectives in this context. Other de- 
sign criteria which can be related to a Bayesian 
perspective appear in Tsutakawa (1972, 1980), Za- 
cks (1977) and Pronzato and Walter (1987). How 
well the Bayesian criteria actually approximate the 
expected utility in small samples is not very well 
known. Some illustrations are presented in Atkinson 
et al. (1993), Clyde (1993a) and Sun, Tsutakawa and 
Lu (1996). Dawid and Sebastiani (1999) attempt to 
connect this type of criterion-based Bayesian design 
to a purely decision-theoretic utility-based approach 
to Bayesian experimental design. 

Midler and Parmigiani (1995) and Miiller (1999) 
suggest estimating the exact posterior utility through 
Markov chain Monte Carlo (MCMC) methods in- 
stead of using analytical approximations to the pos- 
terior distribution. They embed the integration and 
maximization of the posterior utility function by 
curve fitting of Monte Carlo samples. This is done by 
simulating draws from the joint parameter /sample 
space and evaluating the observed utilities and then 
fitting a smooth surface through the simulated points. 
The fitted surface acts as an estimate to the ex- 
pected utility surface and the optimal design can 
then be found deterministically by studying the ex- 
trema of this surface. Parmigiani and Miiller (1995), 
Clyde, Miiller and Parmigiani (1995) and Palmer 
and Miiller (1998) contain applications of these 
simulation-based stochastic optimization techniques. 



6.2 Bayesian Optimality and 
Equivalence Theorems 

All of the Bayesian criteria mentioned above are 
concave over the space of all probability measures 
on the design space X . The equivalence theorem for 
establishing optimality of a design for linear models 
(Whittle, 1973) has been extended to the nonlin- 
ear case by White (1973, 1975), Silvey (1980), Ford, 
Torsney and Wu (1992), Chaloner and Larntz (1989) 
and Chaloner (1993). 

Let £ be any design measure on X . Let the mea- 
sure £ put unit mass at a point x € X, and let the 
measure £' be defined as 

f = (l-e)f + e£ fore>0. 

Let I*(£) and I*(£) denote the information matri- 
ces corresponding to the design measures £ and £, 
respectively. Then the information matrix for £' is 

r(O = (i-e)r(0 + ei*(D- 

The key quantity in the equivalence theorem is the 
directional derivative of a criterion function <j> at the 
design £ in the direction of £, usually denoted by 
d(£,x), and is defined as 

d(e,i) = iimi[^{p(O}-^{r(0}]. 

The general equivalence theorem states that in or- 
der for a design £* to be optimal, the directional 
derivative function in the direction of all single-point 
design measures has to be nonpositive, thus, 

supd(£*,x) = 0. 

It also states that if <j> is differentiable, then at the 
support points of the optimal design, the directional 
derivative function d(£,x) should vanish. The usual 
approach to evaluate Bayesian optimal designs for 
GLMs is through finding a candidate design by nu- 
merical optimization of the criterion function. Veri- 
fication of global optimality is then done by study- 
ing the directional derivative function d(£,x) for the 
candidate design under consideration. 

Example. Chaloner and Larntz (1989) and Zhu 
and Wong (2001) consider the problem of estimating 
quantiles in a dose-response experiment, relating the 
dose level of a drug x to the probability of a response 
at level x, namely, ir(x). A popular model is the 
simple logistic model as described in (5.6), namely, 

(6.4) log(Tr(x)/(l-ir(x)))=9(x-fi). 
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Here, (3 = (6, n) T and the Fisher information matrix 



is 



e 2 t -et{x-n) 

-9t(x — fM) S + t(x — fJ,) 2 



where to* = n(xi)(l - 7r(xj)), t = J2i=iPi w i, x = 
t~ 1 Y,i=iPi w i x i and s = Y,i=iPiWi(xi -x) 2 . Then, 
the Bayesian D-optimality criterion as given in (6.1) 
simplifies to the form 



(6.5) 



L (0 = E [ioge 2 ts]. 



Recall that the parameter [i is the median effective 
dose (ED50) or median lethal dose (LD50). If the 
goal is estimating /j,, c(/3) = (1,0) T does not depend 
on the parameters. More generally, one may want 
to estimate the dose level xq at which the probabil- 
ity of a response is a fixed number, say, a. Clearly, 
£0 = ^ + 7/$ where 7 = log[o/(l — a)], and xo is a 
nonlinear function of the unknown parameters. In 
this case, c(/3) = (l,—j/0 2 ) T does depend on the 
parameters. The Bayesian c-optimality criterion [as 
in (6.2)] for estimating any percentile of the logistic 
response curve reduces to 



(6.6) 



m) = - Ep{e- 2 it- 1 + ( 7 - 0(3? - v)Y 



This can also be written as a special case of 
A-optimality as in (6.3), with B(/3) = c(/3)c(/3) T , 
namely, 



B(/3) = B 7 (/3) 



-I/O 2 



-I/O 2 



where B(/3) is denoted as B 7 (/3) in this case to re- 
flect its dependence on the value of the constant 7. 
If one wants to estimate ED50 and ED95 simultane- 
ously with weight 0.5 each, then B(/3) = 0.5Bq(/3) + 
0.5B 2 . 9 44(/9) (note that log[0.95/(l - 0.95)] = 2.944, 
implying for ED95, 7 = 2.944). 

For the Bayesian D- and yl-optimality criteria, as 
mentioned in (6.1) and (6.3), the directional deriva- 
tive function, respectively, turns out to be 

d&x) =Ep[tr{I((3,0[mO\~ 1 }} -V, 

d^,x)=E f3 [tv{B(p)[mor 1 M(3,o[i((3,or 1 }} 

where £ as defined earlier is the measure with unit 
mass on x € X . Bayesian c-optimality can be viewed 
as a special case of A-optimality with B(/3) = 
c(f3)c((3) T . 



For our logistic model (6.4), the directional deriva- 
tive function with the <pi criterion reduces to 



d{£,x) = E l3 {w(x,f3)[t- 1 + (x 



X) s 



where w(x,(3) = 7r(x)(l — n(x)). For estimating any 
percentile of the dose-response curve, the directional 
derivative function corresponding to the 4>2 criterion 
for c-optimality [as in (6.2)] reduces to 

d(Z,x)=Ep{iv(x 1 (3)(6st)- 2 

+ [t(x-x)(x-n) + s] 2 } + cf>l(0. 

Chaloner (1987) and Chaloner and Larntz (1988, 
1989) developed the use of such Bayesian design cri- 
teria. The general equivalence theorem for these cri- 
teria can be derived under suitable regularity 
conditions (Chaloner and Larntz, 1989; Chaloner, 
1993; see also Lauter, 1974, 1976; Dubov, 1977). 
The directional derivative d(£,x) is evaluated over 
the range of possible values of x to check the global 
optimality of a candidate design. 

6.3 Binary Response Models 

The Bayesian design literature outside the normal 
linear model is vastly restricted to binary response 
models. Tsutakawa (1972, 1980), Owen (1975) and 
Zacks (1977) all have considered optimal design prob- 
lems for binary response models from a Bayesian 
perspective. Many of these designs are restricted to 
equally spaced points with equal weights at each 
point. Chaloner and Larntz (1989) investigate the 
Bayesian L>-optimality and A-optimality criteria with 
several choices of B(/3) in the context of a binary 
response logistic regression model with a single de- 
sign variable x. As illustrated in the above example, 
Chaloner and Larntz (1989) consider the problem 
of finding ED50 and ED95, and of estimating the 
value of x at which the success probability equals 
7 with 7 having a uniform distribution on [0, 1] . 
The last problem is defined as average percentile 
response point estimation. The optimal designs are 
obtained through implementing the simplex algo- 
rithm of Nelder and Mead (1965). They assume in- 
dependent uniform priors on the parameters and 
evaluate the expectation over the prior distribution 
through numerical integration routines. They notice 
that the number of support points of the optimal 
design grows as the support of the prior becomes 
wider and the designs in general are not equispaced 
and not supported with equal weights. Smith and 
Ridout (1998) extend the computational algorithm 
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Table 1 

Bayesian optimal designs for the dose-response model in (6.4) with optimality criterion being the average log determinant [as 

given in (6.5)] 



Prior on fi 


Prior on 


value 


TVllTTlTlPF 

J. 1 LL111UU1 

of doses (fi) 


Design points 


Weights 


U[-2,2] 


U[l,5] 


4.35 


8 


-1.952, -1.289, -0.762, -0.257, 


0.119,0.124,0.126,0.132, 










0.257, 0.762, 1.289, 1.952 


0.132, 0.126, 0.124, 0.119 


U[-2,2] 


U[2,4] 


4.35 


8 


-1.906, -1.093, -0.488, -0.000 a , 


0.132,0.156,0.149,0.064, 










0.000 a , 0.488, 1.093, 1.906 


0.064,0.149,0.156,0.132 


U[-2,2] 


U[2.9,3.1] 


4.36 


6 


-1.882,-0.965,-0.287, 


0.140,0.190,0.170, 










0.287,0.965,1.882 


0.170,0.190,0.140 


U[-0.5,0.5] 


U[l,5] 


3.43 


3 


-0.661,0.0,0.661 


0.389,0.222,0.389 


uj-0.5,0.5] 


U[2,4] 


3.27 


3 


-0.581,0.0,0.581 


0.457,0.087,0.457 


uj-0.5,0.5] 


U[2.9,3.1] 


3.21 


2 


-0.546,0.546 


0.5,0.5 


uj-0.1,0.1] 


U[l,5] 


3.28 


2 


-0.499,0.499 


0.5,0.5 


U[-0.1,0.1] 


U[2,4] 


3.07 


2 


-0.512,0.512 


0.5,0.5 


U[-0.1,0.1] 


U[2.9,3.1] 


3.00 


2 


-0.516,0.516 


0.5,0.5 




= 3 


2.99 


2 


-0.515,0.515 


0.5,0.5 



The rounded-off value 0.000 is exactly evaluated as 0.0003. 



of Chaloner and Larntz (1988) to find locally and 
Bayesian optimal designs for binary response mod- 
els with a wide range of link functions and uniform, 
beta or bivariate normal prior distributions on the 
parameters. 

There is wide interest in the binary response model 
in the context of dose-response bioassays. Markus 
et al. (1995) consider a Bayesian approach to find a 
design which minimizes the expected mean-squared 
error of an estimate of ED50 with respect to the 
joint prior distribution on the parameters of the re- 
sponse distribution. Sun, Tsutakawa and Lu (1996) 



reiterate that approximation of the expected util- 
ity function, using the usual Bayesian design cri- 
teria, can be poor, and they introduce a penalized 
risk criterion for Bayes optimal design. They illus- 
trate that the chance of having an extreme poste- 
rior variance could be avoided by sacrificing a small 
amount of posterior risk by adding the penalty term. 
Kuo, Soyer and Wang (1999) use a nonparamet- 
ric Bayesian approach assuming a Dirichlet process 
prior on the quantal response curve. They adopt 
the simulation-based curve fitting ideas of Miiller 
and Parmigiani (1995) to reduce computational time 



Table 2 

Bayesian optimal designs for the dose-response model in (6.4) when the variance of the estimate of ED95 is considered as 
the optimality criterion [as given in (6.6) with 7 = 2.944 corresponding to ED95] 



Prior on \i 


Prior on 6 


Criterion 
value 


Number 
of doses (n) 


Design points 


Weights 


U[-2,2] 


U[l,5] 


9.39 


8 


-2.401, -1.484, -0.959, -0.416, 


0.015,0.040,0.086,0.118, 










0.129,0.693,1.305,2.230 


0.123,0.125,0.127,0.366 


U[-2,2] 


U[2,4] 


6.58 


6 


-1.514,-0.905,-0.300, 


0.034,0.113,0.187, 










0.359,1.085,2.096 


0.199,0.190,0.277 


U[-2,2] 


U[2.9,3.1] 


6.09 


6 


-1.458,-0.852,-0.327, 


0.042,0.116,0.173, 










0.274,1.031,2.079 


0.197,0.210,0.262 


U[-0.5,0.5] 


U[l,5] 


6.72 


4 


-1.550,-0.001,0.641,1.282 


0.189,0.083,0.178,0.551 


uj-0.5,0.5] 


U[2,4] 


3.52 


3 


-0.898,-0.028,0.938 


0.083,0.134,0.783 


U[-0.5,0.5] 


U[2.9,3.1] 


2.96 


2 


-0.127,0.912 


0.138,0.862 


U[-0.1,0.1] 


U[l,5] 


5.98 


3 


-1.631,0.487,1.159 


0.226,0.190,0.585 


U[-0.1,0.1] 


U[2,4] 


2.80 


2 


-0.938,0.780 


0.164,0.836 


uj-o.i.o.i] 


U[2.9,3.1] 


2.25 


2 


-0.759,0.794 


0.102,0.898 


jtt= 


= 3 


2.19 


2 


-0.800,0.800 


0.093,0.907 
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appreciably. Clyde, Miiller and Parmigiani (1995) 
and Flournoy (1993) also use the logistic regres- 
sion model and present Bayesian design and analysis 
strategies for two very interesting applications. 

Zhu, Ahn and Wong (1998) and Zhu and Wong 
(2001) consider the optimal design problem for esti- 
mating several percentiles for the logistic model with 
different weights on each of them (the weights de- 
pending on the degree of interest the experimenter 
has in estimating each percentile). They use a mul- 
tiple objective criterion which is a convex combina- 
tion of individual objective criteria. They modify the 
logit-design software of Chaloner and Larntz (1989) 
to optimize this compound criterion. Zhu and Wong 
(2001) compare the Bayes optimal designs with se- 
quential designs proposed by Rosenberger and Grill 
(1997) for estimating the quartiles of a dose-response 
curve. Zhu and Wong (2001) note that the sequential 
design, based on a generalized Polya urn model, is 
comparable to the Bayes optimal design. When com- 
pared to the locally compound optimal designs in 
Zhu and Wong (2001), it was noted, as anticipated, 
that the Bayesian design performs better than the 
locally optimal design, if the specified parameters 
are far from the true parameters. Berry and Frist- 
edt (1985), Berry and Pearson (1985), Parmigiani 
(1993) and Parmigiani and Berry (1994) examine 
several clinical design problems using Bayesian ideas 
not limited to just dose-response studies. 

Example 1. Using the program by Smith and 
Ridout (1998), we evaluated the Bayesian optimal 
designs for several choices of prior and criterion 
functions in the context of the dose-response model 
mentioned in (6.4). We assumed that there is some 
prior knowledge on the range of \x and the sign of 9, 
which is often realistic. We arbitrarily chose the best 
guess of /i to be and of 9 to be 3 and assumed uni- 
form priors of different spreads around these centers. 
Table 1 contains optimal designs for several choices 
of priors when one uses the Bayesian D-optimality 
criterion as given in (6.5). Table 2 contains the opti- 
mal designs when one uses the Bayesian c-optimality 
criterion for minimizing the variance of the estimate 
of the 95th percentile of the logistic response curve. 
The explicit formula for this variance is given in 
(6.6) with 7 = log(0.95/(l - 0.95)) = 2.944 for es- 
timating ED95. 

The program optimizes the design criterion under 
consideration for a fixed value of n, the number of 
doses. The user has to vary n manually. Then one 



chooses the design which optimizes the criterion un- 
der consideration for the smallest number of doses. 
The global optimality of the design is then checked 
by evaluating the directional derivative d(£, x) over 
all possible values of x. The results are fairly clear, as 
noted in Chaloner and Larntz (1989); as we increase 
the spread of the prior distribution, the number of 
support points of the optimal design increases. With 
decreasing variability in the prior distribution, the 
designs become closer to the locally optimal design 
for fj, = and 9 = 3. While for the D-optimality cri- 
terion the optimal designs in Table 1 are symmetric 
about 0, for the c-optimality criterion for estimating 
ED95, the designs in Table 2 are asymmetric, tend- 
ing to put more weight toward larger doses. One 
also notes that the designs for a given prior on \i are 
robust with respect to the choice of prior on 9, but 
the converse is not true. The optimal designs change 
quite appreciably as the uncertainty in the prior in- 
formation on fi changes. We also experimented with 
independent normal priors and essentially noted the 
same basic patterns. The results are not included 
here. 

6.4 Exact Results 

Chaloner (1993) characterizes the </>i-optimal de- 
signs for priors with two support points for logis- 
tic regression models with a known slope. She also 
provides sufficient conditions for a one-point design 
to be optimal under both local optimality and a 
Bayesian criterion with a nondegenerate prior dis- 
tribution for a general nonlinear model. As antic- 
ipated, the conditions basically reduce to the sup- 
port of the prior being sufficiently small. Dette and 
Neugebauer (1996) provide a sufficient condition for 
the existence of a Bayesian optimal one-point design 
for one-parameter nonlinear models in terms of the 
shape of the prior density. Haines (1995) presents 
an elegant geometric explanation of these results for 
priors with two support points. Dette and Sperlich 
(1994) and Mukhopadhyay and Haines (1995) con- 
sider exponential growth models with one parameter 
and derive analytical expressions for the weights and 
design points for the optimal Bayesian design. For 
more than one parameter and a dispersed prior dis- 
tribution, analytical results are extremely hard to 
obtain and numerical optimizations are so far the 
only route. 

Sebastiani and Settimi (1997) use the equivalence 
theorem to establish the local D-optimality of a two- 
point design suggested by Ford, Torsney and Wu 
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(1992) for a simple logistic model with design re- 
gion bounded at one end. They also suggest an effi- 
cient approximation to the D-optimal designs which 
requires less precise knowledge of the model param- 
eters. 

Matthews (1999) considered Bayesian designs for 
the logistic model with one qualitative factor at ±1 
and derived closed-form expressions for the weights 
and support points which minimize the asymptotic 
variance of the MLE of the log-odds ratio. He also 
studied the effect of prior specification on the design 
and noticed that as uncertainty on the log-odds ratio 
increases, the design becomes more unbalanced. 

6.5 More Than One Explanatory Variable 

Atkinson et al. (1995) consider a dose-response 
experiment when the male and female insects under 
study react differently and considered the model 

log(7r(x, z)/(l — tt(x, z))) = a + Ox + jz, 

where tt(x, z) is the probability of death of the insect 
at dose level x and z is for males and 1 for females. 
Assuming the proportion of males and females to be 
equal, they illustrate that the larger the separation 
between the two groups, the larger is the cardinal- 
ity of the support for the locally D-optimal design. 
They also consider a Bayesian version of this design 
problem by imposing a trivariate normal distribu- 
tion on the three parameters and notice that the 
design is robust with respect to uncertainty in the 
parameters for this problem. The paper also consid- 
ers designs for estimating ED95 for the two groups 
separately as well as the two groups combined to- 
gether. 

Sitter and Torsney (1995) consider locally D-op- 
timal designs when the model contains two quan- 
titative variables. Burridge and Sebastiani (1992) 
consider a generalized linear model with two de- 
sign variables and a linear predictor of the form 
r] = ax\ +6x2 and obtain locally D-optimal designs. 
Burridge and Sebastiani (1994) obtain D-optimal 
designs for a generalized linear model when obser- 
vations have variance proportional to the square of 
the mean. They do allow for any number of possi- 
ble predictors. However, their results are restricted 
to the case of power link functions. They establish 
that under certain conditions on the parameters of a 
model, the traditional change "one factor at a time" 
designs are D-optimal. They also conduct a numer- 
ical study to compare the efficiency of classical fac- 
torial designs to the optimal ones and suggest some 



efficient compromise designs. Sebastiani and Settimi 
(1998) obtain D-optimal designs for a variety of non- 
linear models with an arbitrary number of covariates 
under certain conditions on the Fisher information 
matrix. 

In a more recent article by Smith and Ridout 
(2003), optimal Bayesian designs are obtained for 
bioassays involving two parallel dose-response rela- 
tionships where the main interest is in estimating 
the relative potency of a test drug or test substance. 
They consider a model for the probability of a re- 
sponse as 

(6.7) -ir(x,z)=F(a + 9(x-pz)), 

where z is or 1 representing the two substances 
(called the standard and test substance, resp.) and 
F~ l is a link function. The parameter p is the rel- 
ative log potency of the test substance compared 
to the standard substance. Smith and Ridout con- 
sider local and Bayesian D-optimal designs as well 
as the D s -optimal design which is appropriate when 
interest is mainly in a subset of parameters (here 
9 and p), the others (here a) being considered as nui- 
sance parameters. The designs are obtained numer- 
ically and optimality is verified by using the corre- 
sponding directional derivative function. This model 
contains one quantitative and one qualitative predic- 
tor with no interaction, and as discussed in Chap- 
ter 13 of Atkinson and Donev (1992), the local D- 
optimal designs for the two substances (z = and 
z = 1) are identical. The number of support points 
for each substance is also the same as for the cor- 
responding local D-optimal design with a single- 
substance experiment. 

To illustrate the proposed designs, Smith and 
Ridout (2003) use a data set from Ashton (1972, 
page 59) which provides the number of Chrysan- 
themum aphids killed out of the number tested at 
different doses of two substances. One can model 
the response probability [as given in (6.7)] as a func- 
tion of x = log (dose) and z = factor representing the 
two substances. Smith and Ridout (2003) investi- 
gate the problem of finding optimal designs under 
various link functions, optimality criteria and prior 
choices. Since there are still very few illustrations of 
Bayesian optimal design in the multiparameter sit- 
uation, this numerical example is interesting in its 
own right. 

Smith and Ridout (2005) apply their work on mul- 
tiparameter dose-response problems to obtain 
Bayesian optimal design in a three-parameter binary 
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dose-response model with control mortality as a pa- 
rameter. In many bioassays where the response is 
often death, death may sometimes occur due to nat- 
ural causes of mortality unrelated to the stimulus. In 
such instances, the occurrence of response is due to 
natural mortality or control mortality. The model, 
including control mortality in a dose-response setup, 
as considered in Smith and Ridout (2005), is 

7r(x) = X + (1 - X)F[9{x - n)}. 

The control mortality parameter A is treated as a 
nuisance parameter and a wide range of prior distri- 
butions and criteria is investigated. 

A general treatment of Bayesian optimal designs 
with any number of explanatory variables seems to 
be in order. The computations get much more in- 
volved with an increase in the number of parameters 
as integration and optimization need to be carried 
out in a higher-dimensional space. The simulation- 
based methods proposed in Midler (1999) may be 
of particular importance in these high-dimensional 
problems. 

6.6 Miscellaneous Issues In Bayesian Design 

Multiresponse models. Draper and Hunter (1967) 
consider multiresponse experiments in nonlinear prob- 
lems and adopt locally optimal or sequential design 
as their design strategy. Very little work has been 
done for multiresponse experiments for GLMs us- 
ing Bayesian design ideas. Hatzis and Larntz (1992) 
consider a nonlinear multiresponse model with the 
probability distribution for the responses given by a 
Poisson random process. They consider locally 
D-optimal designs which minimize the generalized 
variance (volume of the confidence ellipsoid) of the 
estimated parameters for given specific values of the 
parameters. They also discuss the case when only a 
subset of the parameters is of interest, leading to 
the local Z) s -optimality criterion. They use a gener- 
alized simulated annealing algorithm along with the 
Nelder and Mead (1965) simplex algorithm. Obtain- 
ing Bayesian optimal designs for nonlinear multire- 
sponse models will certainly pose some computa- 
tional challenges as numerical integration needs to 
be carried out in a higher-dimensional space. Heise 
and Myers (1996) provides methods for producing 
D-optimal designs for bivariate logistic regression 
models. Zocchi and Atkinson (1999) uses D-optimal 
design theory to obtain designs for multinomial lo- 
gistic regression models. 



Sequential Bayesian designs. Ridout (1995) con- 
siders a limiting dilution model for a binary response 
in a seed testing experiment. Suppose that n sam- 
ples of seed are tested; the ith sample (i = 1, . . . , n) 
contains Xi seeds and yields a binary response vari- 
able yi, where y« = if the sample is free of infec- 
tion and yi = 1 otherwise. Let 7Tj = P(ith sample is 
free of infection) and 6 = the proportion of infected 
seeds in the population. Then the limiting dilution 
model is yi ~ Bernoulli (jTi) with 7T, = 1 — (1 — 6) Xi , 
i = 1, . . . , n. The author reparametrizes the problem 
in terms of A = log{— log(l — 6)} and the design cri- 
terion is based on expectations of functions of the 
Fisher information for A with respect to some uni- 
form prior. Single-stage and three-stage designs are 
developed and compared when the sample sizes are 
restricted to small numbers. The three-stage designs 
are found to be much more efficient than single-stage 
designs. This is an interesting example of multistage 
sequential Bayesian design for one-parameter non- 
linear problems where the sample size is constrained 
to be small. Mehrabi and Matthews (1998) also con- 
sider the problem of implementing Bayesian opti- 
mal designs for limiting dilution assay models. Za- 
cks (1977) proposes two-stage Bayesian designs for a 
similar problem. Freeman (1970), Owen (1975), Kuo 
(1983) and Berry and Fristedt (1985) have also done 
work in the sequential domain. 

Number of support points. In classical design the- 
ory, an upper bound on the number of support points 
for an optimal design is usually obtained by invoking 
Caratheodory's theorem, as the information matrix 
depends on a finite number of moments of the de- 
sign measure £. For D-optimality, the optimal design 
typically has as many support points as the num- 
ber of unknown parameters in the model with equal 
weights at each point (Silvey, 1980; Pukelsheim, 1993) 
This property helps to obtain the optimal design 
analytically but has the drawback that there is not 
a sufficient number of support points to allow for 
any goodness of fit checking. This type of upper 
bound result applies to local optimality criteria and 
Bayesian optimality criteria for linear models. For 
nonlinear models with a continuous prior distribu- 
tion there is no such bound available. The space 
of possible Fisher information matrices is now in- 
finite dimensional and Caratheodory's theorem can- 
not be invoked. For most concave optimality crite- 
ria, if the prior distribution has k support points, 
then there exists a Bayesian optimal design which 
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is supported on at most 

k p{p+i)/2 different points 

(Dette and Neugebauer, 1996), where p is the num- 
ber of parameters, but no such bounds are avail- 
able for a continuous prior distribution. Chaloner 
and Larntz (1989) first illustrated how the number 
of support points of the optimal design changes as 
the prior becomes more dispersed. This is an ad- 
vantage of the Bayesian design as it allows for pos- 
sible model checking with the observed data. How 
to incorporate model uncertainty in the paradigm 
remains an important unresolved issue (Steinberg, 
1985; DuMouchel and Jones, 1994). 

Sensitivity to prior specification. Robustness of the 
design to the prior distribution is a desirable prop- 
erty. DasGupta and Studden (1991), DasGupta, 
Mukhopadhyay and Studden (1992) and Toman and 
Gastwirth (1993, 1994) developed a framework for 
robust experimental design in a linear model setup. 
Efforts to propose robust Bayesian experimental de- 
signs for nonlinear models and generalized linear 
models are needed. 

Prior for inference. Tsutakawa (1972) argues that 
when Bayesian inference is considered appropriate, 
it may be desirable to use two separate priors, one 
for constructing designs and the other for subse- 
quent inference. Many practitioners believe in in- 
corporating prior information for constructing de- 
signs, but carry out the analysis through maximum 
likelihood or other frequentist procedures. Using a 
design prior with small variability and an inference 
prior with an inflated variance, as recommended in 
Tsutkawaka (1972), raises philosophical issues for 
discussion. Etzioni and Kadane (1993) and Lindley 
and Singpurwalla (1991) address this dichotomy of 
using informative priors for design and noninforma- 
tive priors for the subsequent statistical analysis. 

Statistical software. Finding optimal Bayesian de- 
signs for multiparameter nonlinear problems with a 
diffuse prior distribution is analytically very difficult 
and can only be obtained numerically. Chaloner and 
Larntz (1988) made the first effort in this direction. 
They introduced FORTRAN77 programs for obtain- 
ing Bayesian optimal designs for logistic regression 
with a single explanatory variable. Smith and Rid- 
out (1998) introduced an enhanced version of their 
program called DESIGNV1, which provides a wider 
range of link functions (not only logistic) as con- 
sidered in Ford, Torsney and Wu (1992) along with 
a greater ensemble of prior distributions and opti- 
mality criteria. The program SINGLE by Spears, 



Brown and Atkinson (1997), available in StatLib, 
also offers the logistic and log-log link functions with 
various choices of priors and an automated proce- 
dure to determine the number of support points. 
Smith and Ridout (2003) extended their software to 
DESIGNV2 to accommodate two explanatory vari- 
ables, one quantitative, the other dichotomous. All 
these programs use the Nelder-Mead (1965) opti- 
mization algorithm. The expectation over a prior 
distribution is computed by some numerical quadra- 
ture formulae (usually Gauss-Legendre or Gauss- 
Hermite quadrature). 

A flexible design software developed by Clyde 
(1993b) is built within XLISP-STAT (Tierney, 1990). 
This allows evaluation of exact and approximate 
Bayesian optimal designs for linear and nonlinear 
models. Locally optimal designs and non-Bayesian 
optimal designs for linear models can also be ob- 
tained as special cases of Bayesian designs. This re- 
quires the NPSOL FORTRAN library of Gill et al. 
(1986) to be installed in the system. The simulation- 
based ideas for obtaining optimal designs (Miiller, 
1999) can also be implemented through XLISP-STAT 

The use of Bayesian design depends greatly on up- 
dating and maintaining the existing programs and 
making them known to practitioners, in addition to 
including similar software in other standard statis- 
tical software packages. 

7. DESIGN COMPARISONS USING 
QUANTILE DISPERSION GRAPHS 

A fourth approach to the design dependence prob- 
lem was recently introduced by Robinson and Khuri 
(2003). Their approach is based on studying the dis- 
tribution of the mean-squared error of prediction 
(MSEP) in (3.7) throughout the experimental re- 
gion R. For a given design D, let Qnip, (3, X) de- 
note the pth. quantile of the distribution of MSEP 
on R\, where R\ represents the surface of a region 
obtained by shrinking the experimental region R us- 
ing a shrinkage factor A, and (3 is the parameter 
vector in the linear predictor in (2.2). By varying A 
we can cover the entire region R. In order to assess 
the problem of unknown (3, a parameter space C to 
which (3 is assumed to belong is specified. Subse- 
quently, the minimum and maximum of Qd(p, (3, A) 
over C are obtained. We therefore get the extrema 

QS ax (p,A) = max{Q D (p,/3,A)}, 

(7,1) QS in (P,A) = min{Q D (p,/3,A)}. 
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Plotting Q^ x {p, A) and Q™ n (p, A) against p results 
in the so-called quantile dispersion graphs (QDGs) 
of the MSEP. A desirable feature of a design D is to 
have close and small values of Q^ ax and Q™ n over 
the range of p (0 < p < 1). Small values of Q^ ax in- 
dicate small MSEP values on R\, and the closeness 
of <3£S ax and Qp m indicates robustness to changes 
in the values of (3 that is induced by the design D. 
The QDGs provide a comprehensive assessment of 
the prediction capability of D and can therefore be 
conveniently utilized to compare two candidate de- 
signs by comparing their graphical profiles. 
The QDG approach has several advantages: 

(1) The design's performance can be evaluated 
throughout the experimental region. Standard 
design optimality criteria base their evaluation 
of a design on a single measure, such as D- 
efficiency, but do not consider the quality of pre- 
diction inside R. 

(2) Estimation bias is taken into consideration in 
the comparison of designs. 

(3) The QDGs provide a clear depiction of the de- 
pendence of a design on the unknown parameter 
vector (3. Designs can therefore be easily com- 
pared with regard to robustness. 

(4) The use of the quantile plots of the MSEP per- 
mits the comparison of designs for GLMs with 
several control variables. 

7.1 Logistic Regression Models 

Robinson and Khuri (2003) applied the QDG ap- 
proach to comparing designs for logistic regression 
models of the form 

1 



(7.2) 7r(x) = - 

where f T (x)/3 defines the linear predictor in (2.2) 
and 7r(x) is the probability of success at x = [x\,X2, 
...,Xf.y, thus /i(x) = 7r(x). Robinson and Khuri 
showed that, in this case, (3.7) takes the form 

MSE[vf(x)] 

= tt 2 (x)[1 - ^(x)] 2 f T (x)(X T WX)~ 1 f(x) 

+ {vr(x)[l - 7r(x)]f T (x)(X T WX)" 1 X T WC 

+ ±7T( X )[ 1 - 7r ( X )][ 1 -Mx)] 

•f T (x)(X T WX)~ 1 f(x)} 2 , 

where W is the same as in (3.4) with w u = m u ir u (\ — 
tt u ), u = 1, 2, . . . ,ra, and £ is an n x 1 vector whose 



nth element is z uu (tt u — 0.5), where z uu is the nth di- 
agonal element of Z = X(X T WX) _1 X T . Here, m u 
denotes the number of experimental units tested at 
the nth experimental run, and tt u is the value of 
7r(x) at x u , the vector of design settings at the nth 
experimental run (n = 1,2, ... ,n). 

Robinson and Khuri considered quantiles of a scaled 
version of MSE[7r(x)], namely, 



(7.3) SMSE[^(x) 



N 



vr(x)[l-vr(x)] 



MSE[t?(x)], 



where N = 2~2u=i m u- Formula (7.1) can then be ap- 
plied with Qd(p, P, A) now denoting the pth quantile 
of SMSE[7r(x)] on the surface of R\. Two numeri- 
cal examples were presented in Robinson and Khuri 
(2003) to illustrate the application of the QDG ap- 
proach to comparing designs for logistic regression 
models. The following example provides another il- 
lustration of this approach in the case of logistic 
regression. 

Example 2. Sitter (1992) proposed a minimax 
procedure to obtain designs for the logistic regres- 
sion model (7.2), where 

f T (x)(3 = 9{x- u). 

This model is the same as the one given in (5.6). The 
designs proposed by Sitter are intended to be robust 
to poor initial estimates of u and 9. The numeri- 
cal example used by Sitter concerns sport fishing in 
British Columbia, Canada, where x is the amount 
of increased fishing cost. The binary response des- 
ignates fishing or not fishing for a given x. Thus 
n(x) is the probability of wanting to fish for a given 
increase in fishing cost. 

Using the D-optimality criterion, Sitter compared 
a locally D-optimal design against his minimax 
D-optimal design. The first design was based on the 
initial estimates uo = 40, #o = 0.90 for u and 9, and 
consisted of two points, namely, x = 38.28,41.72, 
with equal allocation to the points. For the second 
design, Sitter assumed the parameter space C : 33 < 
u < 47, 0.50 < 9 < 1.25. Accordingly, the minimax 
D-optimal design produced by Sitter entailed equal 
allocation to the points, x = 31.72, 34.48, 37.24, 40.00, 
42.76,45.52,48.28. This design was to be robust 
over C. 

The QDG approach was used to compare Sitter's 
two designs. The same parameter space, C, was con- 
sidered. The experimental region investigated was 
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R: 30 < x < 50, and the two designs consisted of 70 
runs each with equal weights at the design points. 

The scaled mean-squared error of prediction 
(SMSEP) is given in (7.3). For selected values of 
(//, 9) in C, values of SMSEP are calculated through- 
out the region R for each design. The maximum and 
minimum quantiles (over C) of the distribution of 
SMSEP on R are then obtained as in (7.1). Since 
in this example there is only one control variable, 
no shrinkage of the region R is necessary. The com- 
bined QDGs for the two designs are shown in Figure 
1. We note that the dispersion in the quantile val- 
ues for the minimax D-optimal design is less than 
that for the locally D-optimal design. This indicates 
more robustness of the former design to changes in 
the parameter values. This is consistent with the 
conclusion arrived at by Sitter that "for most of the 
region, the minimax D-optimal design performs bet- 
ter than the locally D-optimal design." 

8. CONCLUSION 

The research on designs for generalized linear mod- 
els is still very much in its developmental stage. 
Not much work has been accomplished either in 
terms of theory or in terms of computational meth- 
ods to evaluate the optimal design when the di- 
mension of the design space is high. The situations 
where one has several covariates (control variables) 
or multiple responses corresponding to each subject 
demand extensive work to evaluate "optimal" or at 



least efficient designs. The curve fitting approach 
of Miiller and Parmigiani (1995) may be one direc- 
tion to pursue in higher-dimensional design prob- 
lems. Finding robust and efficient designs in high- 
dimensional problems will involve formidable com- 
putational challenges and efficient search algorithms 
need to be developed. 

The stochastic approximation literature, as dis- 
cussed in Section 5, dwells primarily on the develop- 
ment of algorithms for the selection of design points. 
Similar ideas can be brought into case-control stud- 
ies where the prime objective is to study the associ- 
ation between a disease (say, lung cancer) and some 
exposure variables (such as smoking, residence near 
a hazardous waste site, etc.). Classical case-control 
studies are carried out by sampling separately from 
the case (persons affected with the disease) and con- 
trol (persons without the disease) populations, with 
the two sample sizes being fixed and often arbitrary. 
Chen (2000) proposed a sequential sampling proce- 
dure which removes this arbitrariness. Specifically, 
he proposed a sampling rule based on all the ac- 
cumulated data, which mandates whether the next 
observation (if any) should be drawn from a case or 
a control population. He showed also certain opti- 
mality of his proposed sampling rule. 

However, like much of the stochastic approxima- 
tion literature, Chen touched very briefly on the 
choice of a stopping rule, but without any optimal- 
ity properties associated with it. It appears that a 
Bayes stopping rule or some approximation thereof 




0.0 0.2 0.4 0.6 0.8 1.0 

P 

Fig. 1. Combined QDGs for the minimax D-optimal and D-optimal designs. 
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can be introduced along with Chen's sampling rule 
so that the issues of optimal stopping and choice of 
designs can be addressed simultaneously. 

The use of the quantile dispersion graphs (of the 
mean-squared error of prediction) provides a con- 
venient technique for evaluating and comparing de- 
signs for generalized linear models. The main advan- 
tages of these graphs are their applicability in ex- 
perimental situations involving several control vari- 
ables, their usefulness in assessing the quality of pre- 
diction associated with a given design throughout 
the experimental region, and their depiction of the 
design's dependence on the parameters of the fit- 
ted model. There are still several other issues that 
need to be resolved. For example, the effects of mis- 
specification of the link function and/or the parent 
distribution of the data on the shape of the quan- 
tile plots of the quantile dispersion graph approach 
need to be investigated. In addition, it would be of 
interest to explore the design dependence problem in 
multiresponse situations involving several response 
variables that may be correlated. The multiresponse 
design problem in a traditional linear model setup 
was discussed by Wijesinha and Khuri (1987a, b). 
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